234 PART 5 Looking for Relationships with Correlation and Regression

Understanding the Basics of

Multiple Regression

In Chapter  16, we outline the derivation of the formulas for determining the

parameters of a straight line so that the line — defined by an intercept at the Y

axis and a slope — comes as close as possible to all the data points (imagine a

scatter plot). The term as close as possible is operationalized as a least-squares line,

meaning we are looking for the line where the sum of the squares (SSQ) of vertical

distances of each point from to the line is the smallest. SSQ for a fitted line is

smallest for the least-squares line than for any other line you could possibly draw.

The same idea can be extended to multiple regression models containing more

than one predictor (which estimates more than two parameters). For two predic-

tor variables, you’re fitting a plane, which is a flat sheet. Imagine fitting a set of

points to this plane in three dimensions (meaning you’d be adding a Z axis to your

X and Y). Now, extend your imagination. For more than two predictors, in regres-

sion, you’re fitting a hyperplane to points in four-or-more-dimensional space.

Hyperplanes in multidimensional space may sound mind-blowing, but luckily for

us, the actual formulas are simple algebraic extensions of the straight-line

formulas.

In the following sections, we define some basic terms related to multiple regres-

sion, and explain when you should use it.

Defining a few important terms

Multiple regression is formally known as the ordinary multiple linear regression

model. What a mouthful! Here’s what the terms mean:»

» Ordinary: The outcome variable is a continuous numerical variable whose

random fluctuations are normally distributed (see Chapter 24 for more about

normal distributions).»

» Multiple: The model has more than two predictor variables.»

» Linear: Each predictor variable is multiplied by a parameter, and these

products are added together to estimate the predicted value of the outcome

variable. You can also have one more parameter thrown in that isn’t multi-

plied by anything — it’s called the constant term or the Intercept. The following

are examples of linear functions used in regression: